Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization
نویسندگان
چکیده
Deep reinforcement learning with domain randomization learns a control policy in various simulations randomized physical and sensor model parameters to become transferable the real world zero-shot setting. However, huge number of samples are often required learn an effective when range is extensive due instability updates. To alleviate this problem, we propose sample-efficient method named cyclic distillation (CPD). CPD divides into several small sub-domains assigns local each one. Then policies learned while cyclically transitioning sub-domains. accelerates through knowledge transfer based on expected performance improvements. Finally, all distilled global for sim-to-real transfers. CPD’s effectiveness sample efficiency demonstrated four tasks (Pendulum from OpenAIGym Pusher, Swimmer, HalfCheetah Mujoco), real-robot, ball-dispersal task. We published code videos our experiments at https://github.com/yuki-kadokawa/cyclic-policy-distillation.
منابع مشابه
Sample Efficient Reinforcement Learning with Gaussian Processes
This paper derives sample complexity results for using Gaussian Processes (GPs) in both modelbased and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of step...
متن کاملSim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterpa...
متن کاملFlexible Robotic Grasping with Sim-to-Real Transfer based Reinforcement Learning
Robotic manipulation requires a highly flexible and compliant system. Task-specific heuristics are usually not able to cope with the diversity of the world outside of specific assembly lines and cannot generalize well. Reinforcement learning methods provide a way to cope with uncertainty and allow robots to explore their action space to solve specific tasks. However, this comes at a cost of hig...
متن کاملApplication of Reinforcement Learning to Batch Distillation
An important amount of work exists on the topic of optimal operation and control of batch distillation though it is still based on the assumption of an accurate process model being available. While this assumption is valid from a theoretical point of view, there will always remain the challenge of practical applications. Reinforcement Learning (RL) has been recognised already as a particularly ...
متن کاملSafe and Efficient Off-Policy Reinforcement Learning
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of “off-policyness”; and (3) it is efficient as it makes the b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Robotics and Autonomous Systems
سال: 2023
ISSN: ['0921-8890', '1872-793X']
DOI: https://doi.org/10.1016/j.robot.2023.104425